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Abstract 

Vehicle self-localization by visual-scene matching is 
currently an active research topic in robotics and 
computer vision communities. The major challenge 
of this problem is the large change in the appear¬ 
ance of any place between different day hours, and 
between different seasons. Recently many research 
effort tried to tackle this problem by integrating an 
illumination invariant image matching method with a 
robust data fusion algorithm. This article introduces 
an accurate vehicle self-localization algorithm that 
relies on our novel Random Local Difference Binary 
(RLDB) image descriptor for image matching, and 
on Markov filter as a data fusion algorithm. The 
article presents experimental work that compares 
our proposed algorithm with other modern vehicle 
visual-localization algorithms. The results show that 
our algorithm outperforms these algorithms, as it has 
higher localization accuracy and faster cycle rate. 

Keywords: Computer vision, Autonomous vehi¬ 
cles, Visual navigation, Markov filter. 

1 Introduction 

Vehicles and robots self-localization by visual scene 
matching is currently an active research topic. This 
problem assumes that there is a pre-stored database 
contains large number of images for all places in the 
navigation area. Each of these database images is aug¬ 
mented with a Geo-tag that holds the geographical 
coordinates of the location where this image was cap¬ 
tured. During the real time localization phase, the ve¬ 
hicle holds a camera that captures images frequently. 
Each of these real-time images is matched with the 
database images. Once the best database match is 
found, its geo-tag is used to get an estimation for the 
current location of the vehicle. 

The importance of this problem is attributed to 


the current trend of producing reliable self-driving 
cars. Although most of the current autonomous ve¬ 
hicles projects relay on Global Positioning System 
(GPS) and Light Detection And Ranging (LiDAR) 
for localization, many recent research effort introduce 
visual-scene matching as an alternative or complemen¬ 
tary method [3, 14, 5]. Even in some circumstances, 
relying on visual localization is more reliable than re¬ 
lying on GPS, like during navigation in places with 
high terrains or buildings that distort the GPS sig¬ 
nals. Moreover, in places with complete absence of 
GPS signals, like tunnels and indoor places. Visual 
scene matching is also attractive for the localization 
problem because it depends on a low cost sensor that 
has low weight and low power consumption. 

The first major challenge in the visual localization 
problem is the large appearance change that happens 
to any outdoor scene during day hours, and between 
different seasons. The extreme case for appearance 
change is when matching day and night images for the 
same place. To overcome this challenge, it is required 
to use an image descriptor that is robust against illu¬ 
mination and appearance changes. The second major 
challenge in this problem is the presence of large per¬ 
centage of matching-outliers generated in the image 
matching process. Even the most accurate image de¬ 
scriptors can not completely eliminate outliers. Such 
outliers cause miss-localization. To overcome this sec¬ 
ond challenge, it is required to integrate a secondary 
measurement source, like an odometry sensor. It is 
also required to use an accurate data fusion algo¬ 
rithm to estimate the optimal expected vehicle loca¬ 
tion based on the two available measurement sources. 

The algorithm presented in this article repre¬ 
sents an accurate, and fast statistical vehicle visual- 
localization algorithm. It relies on our novel RLDB 
binary image descriptor, introduced in [6]. RLDB 
is an accurate appearance invariant image descriptor 
that is suitable for outdoor images matching. The 
proposed localization algorithm also uses Markov lo- 
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calization filter to integrate odometry sensor measure¬ 
ments with RLDB image matching results. The ex¬ 
perimental work presented in the article demonstrates 
that our proposed localization algorithm outperforms 
the current state-of-the-art vehicle visual localization 
algorithms. 

The main contributions of this article are: 

1) Building a precise vehicle self-localization algo¬ 
rithm that is more accurate than state-of-the-art algo¬ 
rithms. 2) Propose an approximation for Markov lo¬ 
calization filter that decreases its computational cost, 
while it almost keeps the same localization accuracy. 

The remainder of the paper is organized as fol¬ 
lows: In section 2, a review for previous research ef¬ 
fort done on vehicle visual-localization is introduced. 
In section 3, a review for the RLDB image descriptor 
is presented. In section 4, an introduction to Markov 
localization filter is presented, and a suggested modi¬ 
fication to reduce its computational complexity is in¬ 
troduced. In section 5, the experimental work done in 
this research is presented. In section 6, a discussion 
about the achieved results is presented. Finally, in 
section 7, the conclusion of this article is presented. 

2 Literature review 

Fast Appearance-Based Mapping (FAB-MAP) [8] is 
one of the classical successful visual localization al¬ 
gorithms. It matches images using local image fea¬ 
tures, and it has a training phase. Since it relies 
on local image features, it has poor performance in 
the presence of illumination and appearance changes 
[11, 12]. More recent algorithms suggest using whole- 
image matching, to gain robustness against appear¬ 
ance changes. Sequence Simultaneous Localization 
And Mapping (SeqSLAM) [10] is one of these algo¬ 
rithms. It matches the images using Sum of Absolute 
Difference (SAD) for a down-scaled (64x64 pixels), il¬ 
lumination normalized version of the original images. 
SeqSLAM matches a complete sequence of successive 
real-time images with the database image-sequence. 
Instead of finding the global match only for the cur¬ 
rent single real-time image. By applying this tech¬ 
nique, it gains robustness against matching-outliers. 
The major drawback of SeqSLAM is its restriction on 
the vehicle speed, to be the same speed of the vehi¬ 
cle from which the database image-sequence was cap¬ 
tured. Such restriction is unrealistic in the practical 
robotics applications. 

The restriction of SeqSLAM was relaxed by the 
same team in the Sequence Matching Across Route 
Traversals (SMART) [14]. This algorithm incorpo¬ 
rates an odometry displacement sensor to capture the 
successive images separated by fixed displacement dis¬ 
tance, instead of being separated by fixed time inter¬ 
val. Such modification enables matching two image 
sequences regardless the relative vehicles-speed dur¬ 
ing capturing each of them. A farther development 


for SMART was introduced in [15]. Two side look¬ 
ing cameras were used simultaneously, and multi-scale 
image matching is used to gain robustness against lat¬ 
eral vehicle translation. Both SMART and SeqSLAM 
use SAD preceded by illumination normalization for 
image matching. This method achieves better match¬ 
ing accuracy than the local-descriptors based image 
matching. However, the major drawback of SMART 
is its large computational cost [16]. 

Using binary image descriptor is an alternative im¬ 
age matching method that has low descriptor compu¬ 
tation time, and low matching time. Binary image 
descriptors originally developed to be used in hand¬ 
held devices, like tablets and smart phones that have 
low computational and memory capabilities. Their 
descriptor is a binary string of fixed length, and their 
matching is done by computing the Hamming distance 
between the descriptors. No floating-point operations 
are required in computing the descriptor, nor during 
matching it. That explains why they have low com¬ 
putational cost. Moreover, their image matching ac¬ 
curacy is competitive with the classical vector-based 
image descriptors [5]. 

Able for Binary-appearance Loop-closure Evalua¬ 
tion (ABLE) [4] is a modern vehicle localization algo¬ 
rithm that relies on a binary image descriptor. It uses 
the Local Difference Binary (LDB) descriptor [18, 19] 
for image matching. The descriptor is computed for 
an illumination normalized, down-scaled version of the 
original images. Similar to SeqSLAM and SMART, 
it matches a complete sequence of successive real¬ 
time images with the database image-sequence, gen¬ 
erating a distance-matrix, each of its elements is the 
Hamming distance between two specific images from 
the real-time image-sequence and the database image- 
sequence, respectively. Localization is done by search¬ 
ing in the distance-matrix for a line inclined by angle 
7r/4, with small difference values. This localization 
method is the major drawback of ABLE algorithm, 
because of its low accuracy compared to statistical 
localization methods, and because of its large compu¬ 
tational cost [16]. 

Localization using statistical filters like Markov fil¬ 
ter, Particle filter, and Kalman filter is more accurate 
and faster than distance-matrix based localization. 
Particle-SMART [16] is a development for SMART al¬ 
gorithm that uses particle filter for localization, while 
it keeps using SAD for image matching. This al¬ 
gorithm is faster and more accurate than the origi¬ 
nal SMART algorithm. However, its image match¬ 
ing method is still much slower than binary descrip¬ 
tors based algorithms. The research presented in [7] 
outperforms Particle-SMART by using binary descrip¬ 
tors for fast image matching, and a multi-hypothesis 
Markov filter for statistical data-fusion. 

Using Convolutional Neural Networks (CNN) for 
extracting image-features for place recognition is an 
alternative choice. NetVLAD [3] is currently the 
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state-of-the-art CNN based algorithm for place recog¬ 
nition by scene matching. It integrates CNN with 
Vector of Locally Aggregated Descriptors (VLAD) 
method [9]. Despite the accurate results achieved by 
this method, it has large computational cost, which 
makes it unsuitable for real-time applications with 
high frame rate, like the vehicle localization problem. 

The algorithm presented in this article represents a 
novel vehicle localization algorithm that is faster and 
more accurate than other state-of-the-art algorithms. 
It uses our RLDB binary image descriptor [6] for im¬ 
age matching, which gives the algorithm robustness 
against appearance and illumination changes. Markov 
filter is used as a data fusion algorithm, to integrate 
odometry sensor measurements with image matching 
results, in order to get an accurate estimation for the 
location of the vehicle. 


3 RLDB descriptor review 

LDB is a binary image descriptor originally developed 
to be used in devices with low memory and processing 
power capabilities, like smart phones and tablets. It is 
a local image descriptor, so it is preceded by a feature 
extraction algorithm, like the Features from Acceler¬ 
ated Segment Test (FAST) that detects the locations 
of the interest points in the image. Then LDB is used 
to generate a binary string descriptor for each of these 
interest points. However, LDB is used in ABLE as a 
global image descriptor, that computes a single binary 
string describes the whole image. The original LDB 
divides the image using grids of square-cells, with dif¬ 
ferent resolutions. For instance, 2x2, 3x3, 4x4, and 
5x5 grids for 4 resolution-levels LDB. Three represen¬ 
tative functions are computed for each cell: cell aver¬ 
age intensity /, and cell average intensity-gradient in 
horizontal and vertical directions, dx and dy , respec¬ 
tively. The image descriptor is computed by randomly 
selecting a fixed number of cell-pairs, and compare 
the representative functions of each pair, to generate 
3 bits per pair. All generated bits are concatenated 
to form the final binary descriptor. 

RLDB is a novel suggested modification intro¬ 
duced to LDB descriptor to enhance its image match¬ 
ing accuracy, especially in the presence of large ap¬ 
pearance change between the matched images. It re¬ 
places the cell average intensity-gradient functions dx 
and dy by the cell average absolute intensity-gradient 
functions \dx\ and \dy\ that are more robust against 
illumination and appearance changes. RLDB also re¬ 
places the linear growing cell-grids with randomly se¬ 
lected image-cells that have random sizes and random 
locations. RLDB achieves better image matching ac¬ 
curacy than LDB. A performance comparison between 
RLDB and LDB is found in [6], where this research 
also presents precision-recall curves that study the 
stability of RLDB against inliers-outliers. The code 
that implements these experiments that study RLDB 


performance is publicly available at [2]. 

Figure 1 illustrates how RLDB descriptor is com¬ 
puted for an illumination normalized image. Figure 
1(a) shows square cells of random sizes that are ran¬ 
domly spread over the image area. Pairs of these cells 
are selected for representative functions comparison. 
In the figure, each cell-pair is marked by a line con¬ 
necting the two image-cells centers. Figure 1(b) shows 
the comparison between the representative functions 
of two image-cells, which generates three binary bits. 



(b) Binary test generates 3 bits. 


Figure 1: 1(a) Randomly selected cell-pairs. Each 
cell-pair is marked by a line connecting the centers of 
the two cells. 1(b) Comparison between the represen¬ 
tative functions of two image-cells, which generates 
three binary bits. 


In the experimental work of this article, we used 
the same implementation of RLDB presented in [6]. 
Except we add a new restriction to only compare cells 
of the same size. This modification aims at reducing 
the computational cost. Since it eliminates the re¬ 
quirement of normalizing the representative functions 
over the cell area, which saves three floating-point di¬ 
vision operations per image-cell. 
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4 Vehicle localization 

In the vehicle localization problem, it is assumed that 
there is a pre-stored database, contains images for dif¬ 
ferent locations that cover all navigation area. The 
binary descriptors of these images are computed of¬ 
fline, and stored in a single binary matrix Mbb , of size 
N d x Ljj, where Njj is the number of database images, 
and Lb is the length of the descriptor in bits. During 
each cycle of the real-time localization phase, the de¬ 
scriptor of the current real-time image q t is matched 
with all database image-descriptors. This is done by 
computing the Hamming distance between q t and each 
row in the database matrix Mbb , to generate a dif¬ 
ference vector D t of length Nd, each of its elements 
represents the Hamming distance between q t and a 
specific database image-descriptor. Equation 1 shows 
how to compute each element in the vector D t 

Ld 

Dt(i) = y ^Qt(j) ®M DB (i,j) , 1 <i<N D (1) 

3 = 1 

where ® is binary XOR operation. 

Vehicle localization algorithms cannot rely only on 
an accurate image matching algorithm, because even 
the most accurate image matching methods generate 
significant percent of outliers. All previous accurate 
visual localization algorithms integrate odometry sen¬ 
sor measurements with image matching results, in or¬ 
der to get an accurate and reliable estimation for ve¬ 
hicle location. A data fusion algorithm has to be used 
to implement such integration process. SMART and 
ABLE integrate the measurements using the distance- 
matrix method, which has large computational cost, 
and limited accuracy. Particle filter is used in Particle- 
SMART, which is a statistical filter that has higher 
accuracy and lower computational cost compared to 
the distance-matrix. Kalman filter and Markov filter 
are other two possible choices of statistical data-fusion 
algorithms that can be used in this problem. In our 
research work, a modified version of Markov filter is 
adopted that has lower computational cost than the 
original Markov filter. 

4.1 Markov filter 

Markov filter is a non-linear statistical filter, in which 
a belief function Bel t (x t ) is generated at each filter cy¬ 
cle that represents a probability distribution for the 
expected current-state. In our problem, the belief 
function represents the probability of finding the ve¬ 
hicle at each location in the state-space. Each filter 
cycle is composed of two steps: 

1) propagation phase, in which the belief function 
of the previous cycle Bel t -i(x t -i) is propagated in 
time using the odometry sensor measurements, to gen¬ 
erate the prior distribution Pt(x t ) that represents a 
new estimated distribution for vehicle location. Pt(x t ) 


is computed as follows 

Pt{x t ) = y] P{x t \x t -i)Bel t -i{x t -i) (2) 

Vcct-i 

where P(x t \x t -i ) is the state transition probability, 
determined from the measurements of the odometry 
sensor and its error covariance matrix. 

2) update phase, in which the filter combines the 
prior probability distribution Pt(x t ) with the sensor 
measurement probability distribution (generated from 
image matching in our case) to estimate the final be¬ 
lief function of the current cycle Bel t (x t ), which is 
computed as follows 

Bel t (x t ) = P(z t \x t )P t (x t )/P(z t ) (3) 

Due to the complexity of computing P(z t ), the 
previous expression can be expressed alternatively as 

Bel t (x t ) = r]P(z t \x t )P t (x t ) (4) 

Where r] is a normalization constant, and P(zt\x t ) is 
the measurement probability distribution that repre¬ 
sents the probability of having the current measure¬ 
ment z t (the descriptor of the current real-time image) 
at each possible vehicle location in the state-space. 
This distribution can be calculated from the differ¬ 
ence vector D tl shown in equation 1. The element¬ 
wise reciprocal of this vector is normalized, to get the 
required probability distribution P(z t \x t )• 

The initial belief function at t = 0 is assumed to 
be a uniform distribution, since we do not have any 
prior information about the initial vehicle location. 

4.2 Improve Markov filter computa¬ 
tional efficiency 

During each Markov filter cycle, the current real-time 
image is matched with the entire database images 
to compute the measurement probability distribution 
P(z t \x t ). This represents a large computational cost, 
which reduces the cycle-rate of the filter. One possi¬ 
ble solution to reduce the computational cost per cycle 
is to divide the localization process into two phases: 
searching phase, and tracking phase. The algorithm 
initially starts in the searching phase, where no solid 
estimation for the current vehicle location is reached 
yet. In this phase, the real-time image is matched with 
the entire database. The algorithm keeps running in 
the searching phase until a solid estimation for the 
current vehicle location is reached, then it switches to 
tracking phase. In our experimental work, the esti¬ 
mated location is considered to be a solid estimation 
if the maximum peak in the generated belief function 
Bel t (x t ) is greater than 2.5 times the second maxi¬ 
mum peak. 

In tracking phase, the real-time image is matched 
with a small subset of the database images. This sub¬ 
set contains all database images that have geo-tags of 
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locations nearby the current expected vehicle location. 
Such modification achieves a great reduction in the 
computational cost. In our experimental work, image 
matching in tracking phase is done only with database 
frames that have geographical locations within 100 
meters from the expected vehicle location. In track¬ 
ing phase, all elements of the measurement probabil¬ 
ity vector P(z t \x t ) that are located outside the 100 
meters bounds are considered to be zero. 

Algorithm 1 illustrates the steps of the search¬ 
ing phase. The inputs of the algorithm are the real¬ 
time image descriptor at each filter cycle q t , where 
t = 1 , 2 ,..., and the database descriptors matrix Mdb • 
The output of the algorithm is the posterior belief 
function Bel t at each filter cycle. The first step in 
the algorithm is the propagation of the previous belief 
function to compute the prior distribution P t . Then, 
it computes the difference vector D t , and its element¬ 
wise reciprocal vector R t , which is used to compute 
the measurement probability P(z t = qt\x t ). Finally, 
the measurement update is done to compute Bel t . 
The algorithm keeps running until the ratio between 
the first two peaks in Bel t exceeds 2.5. 

Algorithm 2 illustrates the tracking phase. It has 
the same inputs and output of the searching phase 
algorithm. Propagation is done normally in the first 
step to compute P t . A window of width W is centered 
at the location of the maximum peak of P t . Only the 
elements of D t vectors that fall inside this window are 
computed, i.e. the matching is done only with the 
database images that are captured at places nearby 
the prior estimated location. The measurement prob¬ 
ability is then computed, and the measurement up¬ 
date is done to compute Bel t . X(t ) is the location of 
the maximum peak of Bel t , and it is the estimated 
vehicle location for the current cycle. A new window 
of width W is centered at X(£), and all Bel t elements 
outside this window are discarded. Then, Belt is nor¬ 
malized to a probability distribution, to be used in 
the next cycle. 

In the rest of the paper, the algorithm that uses the 
original Markov filter will be addressed as RLDB/MF, 
and the algorithm that uses the faster modified 
Markov filter will be addressed as RLDB/MMF. 

5 Experiments 

In the experiments, there is a pre-stored database con¬ 
tains images showing vehicle surroundings at the dif¬ 
ferent places of the navigation area. This database 
can be created using a vehicle that holds a camera, 
odometry sensor, and GPS receiver. During database 
creation phase, the vehicle travels to cover all roads 
in the experiment area, while capturing a frame ev¬ 
ery one meter of displacement. The odometry sen¬ 
sor is used to determine the right moment of captur¬ 
ing each new image, and the GPS receiver is used to 
augment each frame with a geo-tag. After collecting 


Algorithm 1: Searching phase 

Input: Bel t = o, M DB 

Output: Belt V t > 0 

1: 

O 

O 

2: 

N d 

'■ Pt(xt) = E p(xt\xt-i)Bel t -i(x t -i) 

3: 

X t -!=1 

{Propagation} 

for i = 0 to i = Nd do 

4: 

: Dt(i) = 0 M DB {i) , i =1 :N d 

5 

{Difference vector} 
Rt(i) — Dt (i) ’ Dt(i) 7 ^ 0 

6 

end for 

7 

8: 

P(z t = q t \x t = i) = ,i =1 :N d 

E RtU) 

3 = 1 

{Measurement probability} 

: Bel t (x t ) = p(z t \x t )p(x t ) 

9 

{Measurement update} 
Belt = |F>e/fj| {Normalize Bel t ] 

10 

Find x t and s t {1 st and 2 nd peaks loc.} 

11 

if Bel t {x t ) > 2.5 Bel t (s t ) then 

12 

Switch to tracking phase 

13 

end if 

14 

t — t T 1 

15 

end loop 


Algorithm 2: Tracking phase 


Input: Belt from searching phase, Mdb 
Output: Belt V t Tracking 

1: loop 

N d 

2 : Pt(x t ) = E p{x t \x t -i)Bel t -i{x t -i) 

Xt- 1=1 

{Propagation} 

Find X, Max. peak Location of p(x t ) 
for i = X - W/2 to i = X + W/2 - 1 do 
D t (i) = qt ® Mdb ('l) V i G window 
Rt{p) = d\i) 5 Dt(i) 7 ^ 0 

end for 

R t = ||i?t|| {Normalize R t vector} 

Rt(i) V i G window 

v 0 o.w. 

Bel t {x t ) = p(z t \x t )p(x t ) 

{Measurement update} 
Find X(t) {Max. peak location of Belt] 

' Bel t (x t ) \x t — X(t)\ < W/2 
v 0 O.W. 

{Discard any prob. outside the window} 

13: Belt = H-BeZtll {Normalize Belt } 

14: t = t T 1 

15: end loop 


9: P(z t = q t \x t = i) 


10 : 


11 : 


12 : Bel t (x t ) 
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the database frames, they are down-scaled to 64 x 64 
pixels, their patch-illumination-normalized version is 
computed [10], and finally their binary descriptors are 
computed and stored. 

In the real-time localization phase, the vehicle is 
holding a camera and odometry sensor, while the GPS 
receiver is not used. Each one meter of displacement, 
it captures a new frame, and computes the binary 
descriptor of its downsized, illumination normalized 
version. Then, the descriptor is matched with the 
database frames to generate the measurement proba¬ 
bility vector P(z t \x t ). 


5.1 Datasets 

Each dataset used in the experiments contains two 
videos, one is recorded at day time and the other is 
recorded at night, which represents the extreme case 
of appearance variation between the two image se¬ 
quences. Two datasets were used in the experimental 
work: Highway dataset, and CBD dataset. 

Highway dataset is a vehicle visual localization 
dataset that measures localization accuracy under the 
presence of camera lateral translations. It contains 
two panoramic videos, recorded by a camera fixed 
over a moving vehicle that travels in a 5 km trip in 
the 4-lane highway of Golden coast, in Australia. The 
first video is recorded at night, in which the vehicle 
travels in the same lane during the entire trip. The 
second video is recorded at day time, in which the ve¬ 
hicle changes the lane many times, causing significant 
change in the camera view-point. The major chal¬ 
lenges in this dataset are the large appearance change 
between day and night images, and the large cam¬ 
era lateral translation between the images of the two 
videos. 

CBD dataset is a vehicle visual localization dataset 
for navigation in a network of interconnected roads. 
It also contains two panoramic videos for two trips 
through the roads of Brisbane CBD, Australia. The 
first video is recorded at day time for a 3.6 km trip. 
The second video is recorded at night for a 6 Km trip 
that contains loops and repeated paths. This dataset 
represents a realistic scenario for vehicle localization 
problem, as the paths taken by the vehicle in the two 
trips are different. 

Successive frames in the videos of the two datasets 
are separated by 1 meter of displacement, where the 
odometry sensor used during recording the videos is 
OBD-II board [16]. The two datasets are publicly 
available at the website of Queensland University of 
technology [13]. Figures 2 and 3 show satellite im¬ 
ages for the navigation areas of Highway and CBD 
datasets, respectively [16]. The figures show the tra¬ 
jectories taken by the vehicle during the database im¬ 
age acquisition trip and during the real-time localiza¬ 
tion trip. 


5.2 Results 

Simulated experiments were done to measure the lo¬ 
calization accuracy in the two datasets. In the ex¬ 
periments, the descriptors of the database frames 
were computed offline, before the beginning of the 
experiment, while the descriptors of the real-time 
video frames were computed during the experiment. 
Since the RLDB descriptor is generated from ran¬ 
dom image-cells, the results are slightly change each 
time the experiment is repeated with different cells 
distribution. Accordingly, the experiments were re¬ 
peated many times, each with different cells distri¬ 
bution, and the results stated below are the average 
results. RLDB descriptor used in the experiments is 
of length 12000 bits, i.e. 4000 random cell-pairs are 
generated per image. The estimated vehicle location 
is considered to be correct if it is within 10 meters 
from the ground-truth location. 

Figures 4 and 5 compare the localization accuracy 
achieved by single image matching with the localiza¬ 
tion accuracy achieved by the proposed RLDB/MMF 
algorithm, for both Highway and CBD datasets, re¬ 
spectively. The figures show ground-truth curve in 
blue, and the correctly estimated vehicle locations, 
shown as red M X M marks over the first curve (appears 
as thick line). These figures emphasize on the im¬ 
portance of the data fusion algorithm in the visual 
localization problem. It is clear that the M X M marks 
(Correctly estimated locations) in RLDB/MMF figure 
cover all the curve, and fill the gaps that was in the 
single-image matching figure. In case of single image 
matching, the percentages of the cycles that estimate 
correct locations are 52.89 and 70.67 in Highway and 
CBD datasets, respectively. After using MMF, the 
percentages of the correctly estimated locations in¬ 
creased to 99.75 and 98.42, respectively. 


RLDB 



Figure 4: Results of Highway dataset. Top: correctly 
matched frames using RLDB single-image matching. 
Bottom: correctly estimated vehicle locations using 
RLDB/MMF algorithm. Percentages of correctly es¬ 
timated locations are 52.89 and 99.75, respectively. 

Tables 1 and 2 show the results of our algorithm 
in Highway and CBD datasets, respectively. The ta¬ 
bles also show the results in case of using the origi- 


12 




Automatic Control and System Engineering Journal, ISSN 1687-4811, Volume 19, Issue 2, ICGST, Delaware, USA, December 2019 



Figure 2: Satellite image for Highway dataset vehicle trajectory [16]. In this dataset, the vehicle took the same 
trajectory in both the database image acquisition trip and the real-time localization trip. The trajectory is 
marked by a blue line in the image. 



Figure 3: Satellite images for CBD dataset vehicle trajectories [16]. Database acquisition trajectory is shown 
in the left image marked by a blue line, where the images was collected in four vehicle trips to cover all roads. 
In these trips, the vehicle moved between points 1—5, 6—9, 10—11, and 12—13 respectively. The real-time 
localization trajectory is shown in the right image marked by a red line, where the vehicle moved from point 1 
to point 22, as highlighted by the arrows in the image. 


nal Markov filter, and in case of using LDB descrip¬ 
tor instead of RLDB. Additionally, the tables show 
the results of other state-of-the-art vehicle localiza¬ 
tion algorithms: SeqSLAM, SMART, and Particle- 
SMART. The results of these three algorithms on the 
two datasets are taken from [16]. Each of the two 
tables shows: r] the percentage of filter cycles that 
estimate correct vehicle locations, D gap which is the 
maximum continuous uncertainty distance in meters, 
in which the algorithm is unable to estimate the cor¬ 
rect vehicle location, and finally the cycle rate, which 
is equivalent to the number of processed real-time im¬ 
ages per second. 


In Highway dataset experiments, the modified 
Markov filter starts in the searching phase, which last 
on average for 4 cycles, with average cycle rate of 13.57 
Hz, then the filter switches to the tracking phase for 
the rest of the experiment, which was running with 
average cycle rate of 43.4 Hz. In CBD dataset ex¬ 
periments, the searching phase last on average for 9 
cycles, with average cycle rate of 17.66 Hz. Then, the 
tracking phase starts to run with average cycle rate of 
44.15 Hz. 

The hardware platform used to implement our ex¬ 
periments is 2.2 GHz Intel Core i7 Processor with 8 
GB DDR3 RAM. This selected platform is similar to 
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RLDB 



Figure 5: Results of CBD dataset. Top: correctly 
matched frames using RLDB single-image matching. 
Bottom: correctly estimated vehicle locations using 
RLDB/MMF algorithm. Percentages of correctly es¬ 
timated locations are 70.67 and 98.42, respectively. 


Algorithm 

T] 

Dgap 

Cycle rate 

RLDB/MMF 

99.75 

7 

43.40 Hz 

RLDB/MF 

99.83 

7 

11.29 Hz 

LDB/MMF 

99.72 

12 

43.03 Hz 

SMART-PF 

98 

95 

2.90 Hz 

SMART 

86 

147 

5.80 Hz 

Seq-SLAM 

43 

767 

12.0 Hz 


Table 1: Results of different algorithms on Highway 
dataset, where 77 is the percentage of the correctly 
estimated locations, D gap is the maximum continuous 
uncertainty distance in meters. 


Algorithm 

V 

Dgap 

Cycle rate 

RLDB/MMF 

98.42 

35 

44.15 Hz 

RLDB/MF 

97.98 

56 

14.85 Hz 

LDB/MMF 

91.86 

141 

44.07 Hz 

SMART-PF 

76 

158 

9.06 Hz 

SMART 

61 

580 

7.72 Hz 

Seq-SLAM 

55 

620 

13.5 Hz 


Table 2: Results of different algorithms on CBD 
dataset. 

the platform used in [16], in order to have a fair com¬ 
parison with the cycle rates of the other algorithms. 
The code that implements our algorithm and mea¬ 
sures its accuracy on the two datasets is publicly avail¬ 
able at [1]. 

6 Discussion 

The experimental results demonstrate the superiority 
of our proposed vehicle localization algorithm. The 
results show that the algorithm succeed to estimate 
the correct vehicle location in most of the trip. As 
shown in tables 1 and 2, the proposed algorithm out¬ 
performs the other state-of-the-art algorithms in both 
localization accuracy and cycle rate. Such improve¬ 


ment in localization accuracy is mainly attributed to 
the use of RLDB descriptor in image matching, which 
has greater matching accuracy than LDB and SAD. 

By comparing the cycle rates of RLDB/MMF and 
RLDB/MF, it is clear that the proposed modifica¬ 
tion of dividing the localization process into searching 
phase and tracking phase leads to a significant im¬ 
provement in the computational efficiency. However, 
it is important to mention that this modification may 
lead to kidnapped robot problem [17], if the tracking 
phase starts at wrong estimated vehicle location. In 
such case, the algorithm never finds the correct vehi¬ 
cle location, unless it switches back to the searching 
phase. In our experiments, we avoid this problem by 
the restricted condition of switching to tracking phase 
that depends on the ratio between the first two peaks 
of the belief function Belt. 

Although RLDB has poor performance in the pres¬ 
ence of zooming and rotation [6], it is noted that the 
localization algorithm achieves high localization ac¬ 
curacy in highway dataset, which has large lateral 
camera translation between its database and real-time 
frames. This is attributed to the use of panoramic 
camera in the experiments, where the right and the 
left parts of the panoramic-frame are significantly af¬ 
fected by this camera translation, due to the presence 
of nearby objects (buildings) at the two sides of the 
street. However, the font and the rear part of the 
frame are slightly affected, because usually the ob¬ 
jects appear at these directions are far, which enable 
the algorithm to keep tracking the vehicle location 
based of these last two sub-frames. 

The results also demonstrate that the algorithms 
based on statistical data-fusion like Particle-SMART, 
LDB/MMF, and our algorithm have higher accu¬ 
racy than linear algorithms based on distance-matrix 
matching, like Seq-SLAM and SMART. 


7 Conclusion 

The paper introduced a novel algorithm for fast and 
precise vehicle visual-localization. The algorithm is 
based on Markov localization filter that integrates 
odometry sensor measurements with image matching 
results to estimate vehicle location. For accurate and 
fast image matching, the algorithm uses RLDB bi¬ 
nary image descriptor. The experiments presented 
in the paper compared the proposed algorithm with 
other modern vehicle visual-localization algorithms, 
and the results demonstrated that our proposed algo¬ 
rithm outperforms them. The code that implements 
our algorithm and implements the experiments pre¬ 
sented in the paper is shared online. 
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